University of Amsterdam
2025-10-29
In this lecture we aim to:
Reading: Chapter 8
\(\LARGE{\text{outcome} = \text{model prediction} + \text{error}}\)
In statistics, linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables denoted X.
\(\LARGE{Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \dotso + \beta_n X_{ni} + \epsilon_i}\)
In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters \(\beta\)’s are estimated from the data.
Source: wikipedia
A selection from Field (8.3 Bias in linear models):
For simple regression
Additionally, for multiple regression
To adhere to the multicollinearity assumption, there must not be a too high linear relation between the predictor variables.
This can be assessed through:

For the linearity assumption to hold, the predictors must have a linear relation to the outcome variable.
This can be checked through:
Predict album sales (1,000 copies) based on airplay (no. plays) and adverts ($1,000).
Predict album sales based on airplay and adverts.
\[{sales}_i = b_0 + b_1 {airplay}_i + b_2 {adverts}_i + \epsilon_i\]
The beta coefficients are:
\(\widehat{\text{album sales}} = b_0 + b_1 \text{airplay} + b_2 \text{adverts}\)
\(\text{model prediction} = \widehat{\text{album sales}}\)
\(\widehat{\text{album sales}} = b_0 + b_1 \text{airplay} + b_2 \text{adverts}\)
\(\widehat{\text{album sales}} = 41.12 + 3.59 \times \text{airplay} + 0.09 \times \text{adverts} = 196.413\)
Is that true?
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[76] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[91] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[106] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[136] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[151] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[166] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[181] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[196] TRUE TRUE TRUE TRUE TRUE
\(r^2\) is the proportion of blue to orange, while \(1 - r^2\) is the proportion of red to orange
The explained variance is the deviation of the estimated model outcome compared to the grand mean.
To get a percentage of explained variance, it must be compared to the total variance. In terms of squares:
\(\frac{{SS}_{model}}{{SS}_{total}}\)
We also call this: \(r^2\) or \(R^2\).
Why?

Scientific & Statistical Reasoning